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(54) METHOD FOR ANALYZING GENE EXPRESSION DATA 

(57)Abstract 

PROBLEM TO BE SOLVED: To provide a method and equipment for knowledge 
search based on gene expression data (also called a gene expression profile) using a 
DNA micro-array or the like. 

SOLUTION: The knowledge search is done through: a process receiving the gene 
expression data: a process receiving class information; a process extracting a 
genetic group related to class classification by using a data mining technique: a 
process executing annotation with respect to the genetic group; a process extracting 
the common rule of the genetic group related to the class classification based on the 
genetic annotation; and a process executing the data mining using constraint 
conditions based on the common rule. 
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Japanese Laid-Open Patent Publication 
No. 30093/2004 [Tokukai 2OO4-3O093) 

A. Relevance of the Above-identified Document 

The following is a partial English translation of exemplary 
portions of non-English language information that may be 
relevant to the issue of patentability of the claims of the present 
application. 

B. Translation of the Relevant Passages of the Document 

See also the attached English Abstract. 

[INDUSTRIAL FIELD OF THE INVENTION] 

The present invention relates to a method and an 
apparatus for finding information based on gene expression 
data (also referred to as "gene expression profile") using a DNA 
microarray or the like. 
[0002] 

[PRIOR ART] 

Research is underway to find gene functions through gene 
expression profile analysis and obtain information useful for 
drug development, pharmacology, toxicology, and diagnosis. For 
the analysis of DNA chip data, the following techniques have 
been used, for example: statistical analyses such as correlation 
analysis, principal component analysis, and analysis of 
variance; clustering such as k-mean clustering, hierarchical 
clustering, and self-organizing map; and classification algorithm 
such as nearest neighbor, discrimination analysis, support 
vector machine, neural network, and genetic algorithm (see 
Laura J. van't Veer et al., Gene expression profiling predicts 
clinical outcome of breast cancer, Nature 415, pp. 530-536 
(2002); Scott L. Pomeroy et al., Prediction of central nervous 
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system embryonal tumor outcome based on gene expression, 
Nature 415, pp. 436-442 (2002)). 
[0003] 

[PROBLEMS TO BE SOLVED BY THE INVENTION] 
However, there is no established means that can be used 
to simultaneously analyze large numbers of genes and obtain 
information. As to the information obtained by the analysis, it 
greatly depends on the knowledge of the analyzer. As such, the 
same information cannot always be obtained between different 
analyzers. Further, the amount of data obtained from the DNA 
chip is so large that it exceeds the analyzing capability of 
humans. 

[0004] 

An object of the present invention is to provide a method 
and an apparatus for analyzing a gene expression profile set. 
More specifically, the invention provides a technique for 
extracting, from a gene expression profile set, a group of genes 
that are useful for drug development, pharmacology, toxicology, 
and diagnosis, and a technique for finding laws common to 
such groups of useful genes. 
[0005] 

[MEANS TO SOLVE THE PROBLEMS] 

In order to extract a group of useful genes from a gene 
expression profile set, and find laws common to such groups of 
useful genes, the invention performs the steps of (1) receiving 
gene expression data, (2) receiving class information, (3) 
extracting groups of genes associated with class classification, 
using a data mining method, (4) performing annotation on the 
groups of genes, (5) finding laws common to the groups of genes 
associated with class classification, based on gene annotation, 
and (6) performing data mining using restricting conditions that 
are based on the common laws. It is important that the 
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procedure from step (3) to step (6) be repeated. Conventional 
information systems employ steps (1) through (4). 
Conventionally, the groups of genes associated with class 
classification, and their annotations are presented to an 
analyzer in the form of, for example, a list or a graphical 
interface. The subsequent step of obtaining information relied 
on knowledge and intuition of the analyzer. A problem of this 
approach, however, is that analyzers have different knowledge 
levels. Another problem is that the amount of information 
obtained from the DNA chip gene expression profile is so large 
that the analysis takes a long time. Accordingly, there is a need 
for an information system for assisting an analyzer in obtaining 
information. 
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1-**fc^»<0#»Mfc*cf<ft3C*#*S*EC A^iM^JIRi-safK^, ^E 
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»J - A* S F A -C* »K * 5^^Si:l»at*afi?lfO*aftlflt:i^ < tt*C*#* 

iwi— ^-^-7 r 5 u - s FAKURi-fc»e-3% & l < ttfltriEamjt^t tfrie^iwi-^ 
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1 5] 

1 6] 

ft* ate-?, «b l< fimiEfflmit^tmiE^^Mi^tf$&^^Pi^ s ^B?^^jt{5^ 
t oa**tr*# t t* i t t ts»$s 7 ib«w jae^mT* - ^»*f^s 
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^ &mKm?&&%*®&tc2><Offlfttf%ZflX\,>2, 0 Wx-lf, Wlf (Co r r e 
latipn Analysis), ±B-?5r#f (Principal Compone 
n t A n a l y s i s ) , #ffc5!r#T (Analysis of Variance).*. 
t*<D&itfM%f; k ?7X^J>^ (k-mean Cluster i n g ) , MM ? 

(Hierarchical Clustering), £ v '7 7* 

(Sel f-organizing Map) ^ 7 7 * 'J > ^\ ^fej5#fe (N e 

arest Neighbor), ^JSOfr^T (Discriminant Ana I ys 
i s) , t **- (Support Vector Machine)., 

-a-5H?f7-^ (Neural Network), jtteittT ;V^'J XA (G e 
netic Algorism) i t'OM7 'J X A ^ D N A f -^f - ^ ^)Mf I: 
jbfJ L£zffltf$> & o Laura J. van' t V e e r Gene expre 
ssion profiling predicts clinical outcom 
e of breast cancer. Nature 415, pp. 530 — 5 
36 (2 0 02) ^, Scott L. Pomeroy^, Prediction 
of central nervous systemembryonal tumor 
outcome based on gene expression. Natur 
e 415, pp. 436-442 (2002) 
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y^h&h tiz> r~ 9 &tmxx$> v) , ^^<b K^<Dffi$xifci3*mtx^z> t^o mm 
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&m-tz>%Lmzmih-f2>tztbi l z, (i) ita^^x-^^s»t^^xfi, (2) 7^7 

tf$g££»tlpi£lfl, (3) x-?^->'7~^ft£ffl^T7 7 7^K|1^1-&&{5T' 

^£$m-r&ifi, (4) ffrffii£teT-3£KTy ^fi^i®, (5) Ife^ry 

IS (3) (6) <7)XfI^^m-t-2»^t^S5T^2»o &£o1f*&->*x AfiXfl 1 ^ 

h 4 t-e^^ToTV^Tto 7 7 7^tCili^-r^3taT-^t ^C0T7 X-~>3 7KOV>T, 

i-zK^ftT^tzo *<v&<D%im%:&<Dj:mit, m%x%<D%\m.t%m^M^x^tz 0 
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±IB<Ol»«**l£*).&Oo L^LDNAf 
He (-9->X;Hfc : £#'U' TMJifa-Hfc (ate?* : Srt-*5>SOT) 

X;WC#LTfi^iB9K&|fc-$-&e:i:;& s #v»J t fc 5>*i/cv»*. vT^Siti. fll&l'l 
0-BO3»e-?-"CSlWCSft»t*ttf> 1 0 Oli. 2 0 Oflta^-^if^^ifcT^o^Ji 

Mit^ff SHfeWg;^ ut, ^-LTM^fit (Inaccurate values) #4? L 

*LT^*ii^>f->^fe** s ^:i < ^feoT Li ? (n/^hr*vO 
htLX^&o Z(Dtz£>, MtyftT-f ZMR-fZ Z.L (Data Selection) 
. to J: O t #MJfKijR|g#*#x.aiR :t (Data Cleansing) *f 

CD liirMgrintg i & & <> Data Selection. Data Cleans 

i n g<DVfeiLLtzJofeli%:^tf, 1k<DK nowledge Discover y CDJ^SS: 

gHU (1) Th'jea-h^M^^^^^lt, (2) 7 MJ e.*- HttsMg3£1&j&* 
-f&v^ (3) Ii9ti:tLn^ f-e^v^-e. Data Selection.. D at 
a C 1 e a n s i n gliSfttlS^Stv>9 3 o<D\kWX JsUJ^ 
cofE^-Cli:. Data Selection, Data C 1 eans i ngH ttbX 

[0 0 0 7] 

i&B^mr- ^ tit, ^Sx.lfDNAfyT'ft (D N A Chip), for 

^^Wfe (Differential Display), £i»PC Rft (Q 
uantitative P C R) . SAGE (Serial Analysis of 
Gene E x p r e s s i o n) ft. Xn f- 4 y -y ;/ft (Protein Chi 

P ) fc^tMftfltft^fc l< im&n<7>&m.mk*mfc-rz>-%mz£ 

[0 0 0 8] 

* 9 * til* tit, WiifDNAf ? Xft# T'llJjE 3 ft tzftm. * * tz&)<D tit*T* & 

o «it»*iMS^**>^* , *k*3lE#OJil*-e*4»^ L < lite 

*A&*m>1* m***-C*ittf 1. «*Afi*T*^tf 0 fcSe^T&o 

4fc»aftlA*^aif^S% *^*^Olfe«lO*tta£*0, l, 2, 3 4^i:5Ett*i&„ 

1 . mm&s- 1 2mmk<n^y-7)\s* 2, m9o& j *2 4vzm®tw z^t-t^m 

* 7*#!W|--C;S>fttf, «BS£*fr*fc4:*'HA, Wi^otfe, 1^-^*^ 
ft^'ttSf. ISI (^W^-fX) $:ttS^tfSo 
[0 0 0 9] 

i:«tHtSgf(?)ri:^4, ^J£^5e^ (Decision Tree) , 
WX (Naive Bayes) , 7M4 , ->7> (Fully B a y e s i a n) . 
fflK^-^ (Association rule). (Charac t e_r i 

stic rule). EM^ 5 7.* V > 7~ (EM Clustering). 
ft (Nearest Ne ighbor) . fUS'J 5r#r (Discriminant An 
alys i s) , f^'-h^H-^y^ (Support Vector Mac h i 
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n e ) . ittefi*j TA'^'JXA (Genetic Algorism) > l&l&mM (Lin 
earRegression) <OZ.t^i> 0 >& L < Ji_tffi#ft£4fcffl1 _ *±-C* 
7 (B a g g i n g) . VST- (Bo o s t i n g) . T.*?^'? (S t a c 

king) ^^f^^fltS^H^So 
[0010] 

[0 0 1 1] 
[0012] 

7-f/u*y BtrlB^<t d K;g151fcT f -?£®#iL, ^^S^#^^x.^m i 9fiS 

(#!lx_tf 5 0 0 ^T<7)) S§-£\ L|V^3-;V5 0 OtL, 5O0«T^f^ 
Jg£ 5 0 0 L < IHfnet**^il:^*4, 

[0013] 

^ n*;<c»j t*--> 3 > (Cross-Val idation) fl!)x_ »f f V 7 * - 

n7,;t'J t*— -y 3 > (tenfold cross — val idat ion) > 
< f± l ) — 7*7 > 7 7 h 9 n 7*— v 3 y (Leave-one-out cross 
-validation) (7)it^5o f >7*-/PK^nx/tiJ 7*- v a > i: t±, 7* 
— 9*lv h*yy OlSK^SHH L, 10 91®^^- * T h — >^ L. H 

•9^)1 O^lt^f-^Tf^ M-^-a-ft 1 0 m<0¥tfr<Ol&3kfrh. jEffi^ (*> L< \t 

■r~?-ty Y<D7*>, n - lfl©f-?TH"-->/U ailOlfiOf-^W h 
«-&tfn®OK?TOiB*a^ IE»sp (fc L < fi^7-^) £#a}-r&7Ji£-?&&o IS 
^*i£&<7>7"- ? r-tf^ ~yy~%&V>JEffim (^ L < ti-^9-^) SrJfclKL 

[0 0 14] 

hi a, *$&w<7>vy ^T*%ff-tz>tztb\znmzti&z>^y^3--9 s s*TJ* t± 

teffl£ftTV>&77^;i'A>bO£*6K7 7'f JVft£5£-^ 7 r £ A£1"& o Sc^T* jE 
*}m&Ztltz-&& : ¥-&KT ; x-va dl-l&fc^ tW^r-za ^*ffd^* 

a V^ItfU lEJSIWLi^'fit (« ; H 1 TttO. 9 5) &_h&£>ff. 4 »7L 

[0015] 

132 »i> D NA^y 7*0— L£ia-e&&o 01 6KDNAf «;^fflv>fc 
S«$e&<7)7 n-*-*- h^^-To £-f£##2 4 HDNA7*a-7*2 2 £ @ Se-ftrT £ 0 £c 

tltzm.itt 2 3 DNA7*n-7*2 2 fc/W 7*U ^X$tS 8 ^<OW:, ^^Mm 
S*Wt**«aig2 1 -C^aii-^o Cic7)^ai60^^:. #DNA7*o-y2 2 |c/\>f 7U 
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[0 0 16] 

n o m e ) . iS^jt thir#^ (fJV^'J'/F-AiTr anscr ip t ome) >. 

(7*nx*-A : P r o t e ome) ^i5»t* itf5^l^±> i|5¥jit>l^±s 

Ofni"C2fe* 0 (ome) tt^#**^»i^ ; »«*^f*1"4Sc^I&^2b ^ it 

fs^ (G e n e ) <£>:£#£^V A (Genome) s Ifc&Wto (Transcript) 
<D^fc* WY-l* (Transcr iptome) . ®6K (Protei 

n) <7)^#^-/Dx*-A (Proteome) hW&o X-'mMM&WXT '<^IE^-et±, 

So 0 3 (A) tt£te^fcfflSffcffia s fcv>tKS8^*»K ^SrtD?lf*T-*o {fit*?*, * 
3 (B) ItMB^&ltt K«ffiniMl!), -JSM^tD^I^So fllx-tfia 

(<; jtr> K) OIW«* s — ^M«0— 'W-C*&o El 4 (A) |^>^-n^#y2 (IL2 
) S6IH>^-d^ + ^2»7^7t (IL2RA) , 2 (I 

L2) $fiIH>^-n^*>2 9 (IL2RB) > -4>?-v4 ;3 c>2 ( 

IL2) 161 t-f >^-n-f 2^##>v (IL2RG) , 04 (B) fiff£®3S 
JftAJEHT-^-^l (TGFB1) tMfeMSHf^-^f^l (TGFBR1) 
> (TGFBl) iiItR»IifefiB?^-^$f#2 (TGF 

BR 2) > M«fiS^-^l (TGFBl) tif^)£fiB^-^^#3 

(TGFBR3) > 04 (C) \±j^V avj$^> (EPO) SfiftxiJ 7 tuKxf > 
(EPOR) <7)W<0'J ify K- l^-fe7°^-M#-^$)So IfcDHAfcDNAfe^I 

i,Soi3 (o ftmmKZvxm&ttmtttts&m-r n^^-f to? 

^#7J S #^-TS t & i/>x.S 0 /^-7x>f«)WtU, 0 5 l^tMA P (Mi t 

ogen Activated Protein Kinase) <D& K , 

CDMo safi?)J»?)MEK*fi?K, ME Kitted *>E R K attest tit*ft* s fcj£ S *t* 
o UTfi, ^l^ff/N^^^-Yx-^^-^ (h t t p : //www. 

biocarta. com/) £r#Mo 
[0017] 

0 3 (D) ttDNA&M^Jioafc^Offl^ffifiMfcT**), *®^«a*-eti^ V A 2: 

AiUfia^ES^ tLTV^So 0 6 A<7)0n)^^i- O t Ml 3fM#i(7)l 3 

ql2Hl3ql3 H^CO— ffiT~it, LOC222428 iifi^i^ L 0 C 1 6 0 9 7 

9 aie-f 4 @ 6 i 9 4tf Clf l^ft IT^So abai&tJiltefetto— £P<^I 

$v A<OtiMRtt^rffl-P**o 03 (E) Hfcfc^l^a'PifSWaSfc 
2b*»#-C, M«J60«fc Lt07^> foy-, 0 8<£>BU& (EClEnzyme 
Commission^ B9^-/f-7 7 5 V - <7)H5#73 S & So B7(iADPRT 

fcpf tf*L& ate?©* > Fny'-t4j 0 * > i^ny - 1 {iit^Be^j-^sesie^^f 

e gene ontology consortiums Gene ontology 
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: tool for the unification of biology. N 
ature Genetics 2 5, pp. 25-29 (2000) £#bb 0 j£ 

fny-KiitADPRTaem DNAM (DNA repair), 
ADP-'J^Mt (ADP-ribosylation) OlSSffifc^"*:* ZttffrfrZ 

0 DNAffliDNAM (DNA metabolism) CD— o-e&*)> D N A'ftift 
It&Wi^ttWi (nucleobase, nucleoside, nucleotid 
e a n.d nucleic acid metabolism) . £>— o-c,&;& 0 

[0 0 1 8] 

12 7 V>n -^-^KSIlI^^^tiS^jtfS^^^i-o 3 8 6fl<7)j||EWDNA 
fete"? t LTigifcglHSftTfe *K DNA^m£^-trDNA^StKf± 1 

n z y m e Co mm i s s i o n) i:iot*i? ftT^&o EC 1 J^<b E C 6 2"e& 
*K EC 1 K V^^-l? (Ox i d o r e d u c t a s e 8) , E C 2 fi h 9 

y 7,1 15 --if (Transferases) > E C 3 li'W Kn9--tf (Hydro 1 
ases) , E C 4 »i 'J 7 - (Lyases) , E C 5 Ji-f V ^ v - 4f (Isomer 
a s e s) > EC 6 f± U >ET — 1£ (L i g a s e s) T** 0 EClH 6fiMfcMfi*jt 
fc^-afc^r** 1 ^ *ITV>& 0 HI STJiE C 6 <7>fl]£ Li6-T>6 s . EC6liEC6. 
EC6. 5Cflfl$ili, EC6. 3fiMKEC6. 3. 1HEC6. 3. 5t"?K.ft 
m$fi2>o EC6. 3. 3<*>»£\ *M|liEC6. 3. 3. l*t»EC6. 3. 3 
3 ~C<fc o 
[0 0 19] 

12 9 fcMaH£<D=olo«"TNfc*.*--' , «- 7 r 5 'J -<DM*^-t<> 5 
*?v~y 9 ~? 4 y if~- 1 n^jxtts dna se-&« is* * - 7 * ^ 9 H9 liHtt'ftttVX 

CYPiefiWff||**W4. C YP&fcT:S*}±-£#T~#J5 0 * 
ft£>0*a^Ptt % «i6*«flfc~*:fcCYP 1 A 1, CYP 1 A 2 %i?<D9')l'-zrKfrm 

[0 0 2 0] 

123 (F) l±m.B^mm±&*y h7-^Hii:^.4^T% ^?h7-^©«iUB 

1 OOijRfll^ 12 1 lOSSKffiSftffl. 12 1 2 0ftSfttt©IS«A , *4. 

TOlvttS^tl)*), eiiLtfPu b Gen e T~ 9 * ^TfflSWflU 
7 <1 £ri s -C£ & 0 Tor-Kristian Jenssen^A liter 
ature network of human genes for high— t h 
roughput analysis of gene expression, Na 
ture Genetics, vol.28, pp21-28 #Io 12 1 0 <D&%$1 

^(Di&itttfE.^JcWiT*- 9 TxMED L I NEOR-777 h 9^ Mfc+fc#&Lfc 
#gc£^LTV>& 0 @ 1 0 Ot^i:^ JSfs^A D P RT toi^ TP 

53, CFTR, EEF2, FRA1H, SP1, D A F <D 6 D . ffilM^^ 
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^ T (i 6 &<tt t i> 1 :<OPubGen e XliXMT- 9 * t Lt*lNC 

B I <75ME D L INE^OMI Mtffl^TV^^, <?■ <7>fl&<5D:fc®;-r - 9 * "Cb ^£ to 
&v> 0 m 1 1 fc*? h7-^2 ol©«"e*4«SS«Sft«t Lft-To *i*.V7t 
;i/^^(7>D IP (Database of Interacting 
Proteins) .fcfoJ: ? ^>;^S^fflf-M-X$:ffl^TS6Iffi 
jH^ffl^P^ t t#"C§ & 0 DIP Kov>Tti I . Xenar ios^ DIP: 
the database' of 'interacting proteins, Nu 
cleic Acid Research, vol.28, 289-291, 20 
0 0#io * fiffl2f^ffl7*- ^^-^ici3V>T ^ ffiXflsffll-fc 9 >s*9%.ffl±W 

j&-Cig-£-$ *l-CW>£o fflSft^OSJsS fi> $Jx.tfJ#Sl5e$fc (Dessociation 
Constant) rt^l&ftK** ttT v**Ltf, ^HiOiB^j^S^SO-^ 

, 2@]J^±<7)^^ti]^*StT-m^?tL^+iS^ffl^, «t "jMffl^tML-Cfc d: 

£jsD i v m\-<D9 v 'S.ffl^.&mr- 9 **m^x i>fr$frti:^<> nu 

*y h 7-*<3 3oB<aedT^fc&tt#&l&£ L*1-o ft^]^<^^«B»±KE G G (h t t 
p : //www. kegg. kyoto-u. ad. j p) it t'<7)ftMU.&T - * A 
%&Mo H 3 (C) ^U^i-^ t iO-ftWSKt^aS^Jdt, }§^^ (co-enzym 

[0 0 2 1 ] 

H3tf^H 1 2 \±YJ A, hyy*9V7h-J*, ^of*- A £ ^ofclfffiftti 

9 &&tiM*fcUi. g&^lHfflJEflUH : ^>^f-f :Enzymome 

, -ftS»^#fft :^H'n-A:Met abo 1 ome) . ffiMffl (fflSLffcffl^-fMI : -f 
h-A : Interac tome) , PffW&^HfllftJift (^«^#« : n-*y 
7-A:Local i«ome) , lil i^y-AiPhenome 
) tv^;^|i|iil^l)o M. VidaUA biological atlas 
of functional maps. Cell 104, pp. 333-33 
9 (2 0 0 1) **WSU *«MJW*0»*ffett. *VA % * 'J 7r-A. 7n r 

*-Artfl!>fflS#flifc9 4 mi 3 t^&flfcflMI^JRK^^iffiSf^ffl^o^Tt) 

[0 0 2 2] 

hi 4 i: mi 5 t*HK*^ai-**:ie>o3 >tfa- tL^^s^r^-r^m^-r 

[0 0 2 3] 

3>ea.-^"fk$*t^^i£^-r^H-e**o Ell 4 till 5 K£^T, fcfcl'&m.T 

a <, 111 4 £H 1 5-Cfi«{=^!&gi:r-*£gtf3&ofc#^ JE&itky-ifr 
9 'J oT <b i^Lff *>* < <fcv^ #C7^ 9 >^tiD a t a S 

election. Data Cleansingtv^ T-f-vJ -^^^Sr^-T 

[0 0 2 4] 

DNAf 7 -?<DT- 9f&ffiK&^X^mtj:7 <i fl> 9 D '/^*^Jti 1 1-^1" 0 7 V 
) »£\ L § v^-fiS-f 3-^5 0 0 tL, 5 0 0 JilT^fl-^&JKS: 5 0 0 fc-J-*5&»<> t < ft 
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^^5es fll&tfwi 1 c o x o nfr&ttM&titfeK&'f 
[0 0 2 5] 

[in 

(1) L£l>ffi£ (Threshold method) 

(2) *gglH&$fc& (Correlation method) 

(3) #ffe#8n£ (Modified ANOVA) 

(4) K^XjfS^ (Bayesian T-test) 



7^ 'J l«T**fflM«afcifett, 41* (1) * co^ritfeT-HM U IE 

Sk* > 2«o J 9-^y>wSr*^AnafflL , C, *HH#&£tmi-£o (2) ^v>T 

tttf TV^OffllW^M&fcfli^So (3) 5 > / A f - ^f©ffiM«J![©5t$5-*i: , %T 
-*Wffll»«»fc©«*fl*tf£Jfc*fc-r*o (4) 9>rA7'-?o5 , #*?»fiT*H: 

&o 

[0 0 2 6 ] 

7^?>J>/il«l«ti4*l)tiMlfttliWtt4AHOVA (Analysis o 
f Variance) t «TV»*. fi L^i£ift£ A N 0 V A HU 7*- * |^±>6 5 Sv^i^ 

j&*flf t, & i t tflrttm £ *t S <7>-e, ANOVA^iittti") 

[0 0 2 7] 

7-f JV* 'J >?&<D >f X«ISEttfcov» Tl^^ tSo £ITC y3fcCy5fcv» 

5— feO!>&3fcfe3RT, 2ffi^-9->^JV*^LT|^l^lC/N-f 7*iJ ^-f a >-1"&D 
NAf7 7' *&£SI5eTSo ^tfK i-RNAtZotS-m, Jt73£Cy3> 9 >t 
#£Cy5-C«»U y' ±X~M&^4 7' 'J (f*^* *Cjfc»i5^ 

1 OftgMS) £fr 9 i t£#X-&o Z<v^m<OT~ Cy5/Cy3^f 
= cttl. 0T*4^ ^ta' Kov>Ttt*fcI-F**jE;ft«fcBflN (c, a 2 ) tf^* 
n<Ofcfk&«* IY1.Y2, • • • , Yi, • • • , Ynj SrtttBU -e^fllSMK 
y= (yl, y 2, •••.yi, ••-.yn) t«at4,.f 7 7* 
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btlZtztb I Y 1 , Y 2, ■ Yi, • • • , Yn| fiffiSfc«Lfc (i n d e p e n 

dent 1 y and identically distriuted; i.i d 

) tlUo 

[0 0 2 8] 
[iU] 



Yi'-i.i.d. N(c , a2) = N(l\ a 2) (*t l ) 

10 



(1. o* ) (Do 2 ZMfc-ftltf, 5$1 <t |K Y i <£> a 2 — -C&;|> 0 ^ X#t5eTfi 

^ *-K- (jRSfe*«*^:i:Jflr*A) €*EM i LTSH^gMgjHEM (Highes 
t Density Region; HD R) ^ s #f>*L& 0 9 0 96*liS3&j£EfflfdU nfc 

Sl^Wfi:!, IY1, Y2. • • •, Yi, • • Yn| <omtif&m&&fr1fi 
( a 1 < Y 1 ^ b 1 , al < YI ^ bl, • ■ - , an < Yn 20 

^ b n#igiSKi«jfc3*L*tesp3&jK##) p ( y «| c, o 2 ) 
[ 0 0 2 9 ] 
K2] 



p(y 'lc, ^)=n (1//" 2 71 a) exp (-1/2 a 2 X(yi-c)2) 2 ) 



■^b^y- (yl, y2, • • •, yi, • • • , y n ) i)^k. h fltz t Z <DXSt 
mm. 1 U 2 I y) fi. 

[ 0 0. 3 0 ] 

[it 3] 

n 

1 ( a 2| y > cc n ( 1 / a ) exp (-1/2 a 2 X(y- C )2) (5^3) 
i=l 

40 

[*U] 

1 (a2| y ) cc ( cr 2) - n /2 x C xp (-ns2/2o2) (5t4) 



[0 0 3 1 ] 
K5] 



(12) 
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S2== (1/n) X Z(yi-c)* (^5) 
i=l 

-^ffcoW^p U 2 ) t LT, HIWfM (noninformati 

v.e prior distribution) &Ufe-t*> 0 ^ OiEjStt-Sftfclll LTi 

Cli-iCMtiJ: ? W^^Uo K li "7 y v ^ -If ^fi^^&K 

[0 0 3 2] 
K6] 



p(, r 2) cc „ -2 {#.6) 



t-r tut a^o <>t *) a 1 oiiMp u 2 ) »i, a- 2 -r^^*»s^i:-r?>o 

[0 0 3 3] 
K7] 

p(<r2 | y) °c I ( 0 2| y) p( ff 2) (^7) 



[0 0 3 4] 

K8] 



p( 0 2|y) =c (o«) -(n/2+1) Xexp (-ns2/2o 2) (5^8) 



J: 19, a 2 0»?£#*p (a 2 I y) z - 2 (n, n s 2 ) kmL^frjpK%2> 0 % 

Uo 2:" 2 a) <D¥%)li*/ (v-2) , (J&&7&*ft:*:fc&&A) t± a / 40 

(«/ + 2) t56 s ^^oTv^c7)-e N a 2 <D&W.fcmt LT, 
[0 0 3 5 ] 
K9] 



¥*m££?Pt Lfc»3- : o» = ns»/ (n - 2) (£9) 



Kio] 
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-t- K£g3l Lit o 2 = nsV (n + 2) (SlO) 



[0 0 3 6] 

nsVa 2 —* 2 (n) (iJjr 1) 



<7)HI^#£,:ft.;g> 0 5* 1 1 (n) tit, liSnO*^-*^**. idTIi 

, ns 2 *> s @5efil (MH£) , * 2 *Ws£SfcfcfcoTv>*o jU Uil^ffl^^ut 
"C, HDR££i?>&£ t7& s -e|:^o 5^9, 10, ll^^^Sit?, 2»tV/ 

ftMl:KM&&^£&&£<!: 1. o <fc »9 ^t:KM&&iti££;f -f ^jitfs^ 

^^x-^'f-^^ATJi: LX&m-ttitf&^o £!)x_tf5 @<7>*@re, y, =1. 
4, y 2 =0. 89, y 3 = 1 • 2 4, y 4 = 0 . 91, y 6 = 1 . 04 hfitz t ~t 
Z>t, s 2 = 0. 0478 8T-&&0 Ltz&Mfc (5fc9) i>?> Y i ~N 

( 1, 0. 0 7 9 8 ) 4fc*- FZ&mt Ltc&Mfc* "9 (S10) «t >9 , Y 

i~N(l, 0. 0342) t^So^llJ:^ <r 2 ~ 0 . 2 3 9 4 2 (5), 
t^Jii:^^ ti^ffli/^iita 2 tf)90%HDRIiO. 019-0. 1 7 7 fc 

[0 0 3 7] 

it^tiSo h U l 6 H^1- 0 Ifi?7/f-> a ^£?T (Raw 

) , t>^7;r->3V^J (Co l umn) , *fl5tSMV^^ (fl!)x.tfflri&<*> 
C y 5 t C y 3 OJt^ : Cy5/Cy 3) '^vh'J^^i: Lfc«5t-C<b4 0 u^Sv h 

>^KjaL3fe«att44o ^HIv h 'J ^^fi^Jx.tfC S V^5£**7'IZi5lJ«5£4*' 

[0 0 3 8 ] 

tifc&Wtt^gft (7-A-aMXK^7? K) Cis^t^'i^Mi.o 

ffi, *!ISUfl-*ftov»T»4S p 1 u B*Otttttt»^? ^-->* (Splus^classA 

-77 htS p 1 u s^n-yt^Rfi©e 1 0 7 1 ^---^SrJS^T^fr* - * 

[0 0 3 9 ] 
[*2] 
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&Spft3&§ Jfe (Supervised Methods) 

(1) SSia^^ (Nearest Neighbor) 

(2) $iJ5ji]:#tiT piscriminant Analysis) 

(3) -9*tK— T*» (Support vector machine) 

(4) ftf&Jl/— (Characteristic Rule) 



tO 0 4 0] 
[0 0 4 1 ] 

riF (^fc#2&) THEN J tv^^©I F-THEN 

£fi£L#£ I F-THEN/l'-^OjStli, -Hftfc^Sfefcfcfco IF-THE 

t^ s #^;v-;vi*<7)g^T**^>o #i f-t een ^- ^tfttgLT- ? <r>m&* £*<om&l 

<7>ie*^*> ■& *-&tr ), igsfeaj^Bi: -r a , riF a then bj 

BHfiR*A* (A-B) m^i^iEitJo 

// (A — B) =P (A) - ft X 1 o g [P (B I A) /P (B) ] 

tfC, P (A) " p \t P (A) O^f^tftt^, P (A) \±ttgLr-?<D*X-lk 



CL5) 
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z?)i'<DM'&-*m-fo mm-, p (b) imws&Btim&z hzw, p (bia) a 

RBt<Dfe&&Kis\}ZP (A) **/-«-^, P (BIA) ^spiDftf, Ctt«bti> 

[0 0 4 2 ] 

4)S?tL#S#|![?)IF-THENJ^^^?), iHffif£<7>*§^> Jt«a94?jBcO^- 

. t u, *#asnrai»*cs*t*ifcsg8:o±is (eox.tf> 3) -e<7)tsHi*iT-nrtg 

[0 0 4 3 ] 

-THENJW-;v£*f&o 
[ 0 0 4 4 ] 

) ate^s^wiai-r*^ -^hje^w^v^ji (jc^-^ofiv^ns) ioja±-e 

1 0«^?> 2 0 01@> fif $ L< (±2 0-fS^ib 5 O-BO iOlSftfi? 
[ 0 0 4 5 ] 

^^-^^^^Ji: UT, 09;tl*B S S/WS S Jt*Jflv» &„ 4*BSS 
(i Between— group sum of squares, WSS it W i t h i n 
-group sum of s qu a r e sO«fib>), tfif j WBSS/WSSJk 

[0 0 4 6 ] 
Kl 2] 

BSS<j)/WSS(j) = 

£ £I(yi = kXm(xkj) - m(x.j))2 / 2 2I(y i = k)(xij - m(xkj)) * (Sl2) 
i k i k 



liUm (x. j ) tt^sBfrlfcfcStfaite^ j <7>TO^m*. m (x k j ) Ji, * 7* k 
HJKt-&^#H*5ft*^f5? j Wf^W, x i j ttit^-^l&m-r* 
I(yi = k) li, y i = k <D h i 1 > Zfil&ftli 0 t &&l!S&-e&&o 5£ 1 2 
T~g?*L& B S S/WS SJfc#*#^l2£\ * 7 7.1*1^^31 tft&LLX? ? Xffl<Dfflm 
^iif>Of, ^9->7-f *-Va ><7>8IK»iB S S/WS SJt**^:*v»jte-?-% tfflx-tf 

[0 0 4 7 ] 

r-va yftWZftlo 7;f->3 >C(ii3^f>il 3 S TO fill * ft+gSfpfflO? *> 

*ft-otzm*7ji1-o ->^8#»(IJS *tfc»e-?»*«P robelHPi 

o b e 5 T*ofc»£\ ttUDNAf rKfN^fc-BEftTi&fcite^* 
(U n i g e n e > il^S2?iJ#-^ (G e n b a n k . KjbDX-N *fc 
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[0 0 4 8] 
[13] 




3 





Unigcnc 


Genbank 








P;obel 


Mm.19904 


AA413214 


Bcl2-associaicd X protein 


7 


integral membrane protein 
apopiosis regulator 
apoptosis 




Mm. 98 


AA4 50909 


proteasome (prosorne, 
tnacropain) sub unit, beta 
type 6 


11 


20S core proteasome 

peptidase, proteasome endopeptidase 

ubiquitin-dependent protein 
degradation 


P:x)be3 


Mnt534l 


AA06551O 


defensin beta 1 


8 


extracellular 
antimicrobial peptide 

defense response, 
xenobiotic metabolism 


Piobe4 


Mm.24816 


AI643210 


coagulation factor II 
(thrombin) receptor 


13 


membrane, integral membrane protein 
blood coagulation factor 
G-protein coupled receptor 
protein signaling pathway 


PiobeS 


Mm.850 


AA10636O 


signal recognition particle 
14 kDa (homologous Alu 
RNA-binding protein) 


2 


signal recognition particle 
RNA binding 
protein targeting 



StProbeltProbe4t, integral membrane prote 
i nrt^lLTji^ft&o m&Wb LTJH (integral me 

mb r a n e) KftftTZ j&fr^&tf? 7 ^^Kfifco TV>& t v> •? mHJ#igA,-ev> & 

i££ tLTi/^ST'n — /tj^ L> integral membrane protein ( 
GOff 10 0 1 6 0 2 1 ) KMJ5tS7'n-7 , t^-)ft^#^|g:ifr, -£*IK§^-T 
^lfE?^«*T'f- *v>f • ^n^/^U-r-->a y-t&ZkX-, i n t e g r a 

1 membrane p r o t e i ni s ^H ?7X^It|^t£i^£t65t 
%EM~t2> C So t/c integral membrane protein (G 50 
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0#-f- : 0 0 1 6 0 2 1) <D±V%mi±, membrane (GOf-f" 10 0 1 6 0 2 0 
) -C&&o ZZ-X%\]<DT~ ~y 7X*\±, DNAf? tlX^2>?v-7 

integral membrane protein (GO I 0 0 16 0 
21) Udimembrane (GOff : 0 0 1 6 0 2 0) KttB-t2> W ~ f t 

Wert x, zii^m%-$-2>3tB : f-&<Dfrx*T- -yy • t n*^<; r 

— is a > LT i) £ ^ o HiK "integral membrane protein&e^ 
tf" Zftte&ftkLT^J-yyZft-otzWi&t "raembranem" ££DilUM* 
t LTv^ =~y y^ftotzWi'ktXi' n^^'J r-'y a y&^k (JEfWfc L < 
)' Z&mLtzWsi?^, 1&%<DJEM^rf&%£ y)&\ftllf? 7XfrMKl±^ m emb r a n 
e <> integral membrane protein # s ji- lc"C& %>c\b 

u-J-<d£ Hi 3 30^11 1 3 fci* L^SHH^^ |3©J:^^^ ^fr^HBiS 

g^me^U 7> h^fb^i-^XflT^^o mi!lttatii^i3^^T> g4 ^^6 K^Lfc 

^ n^-f y)&bti?cA^iite : ¥-&tZ<DTST-l/a ( n + 1 ) * 

-<7) x -yy^mToM-rnmn, ^-^^x^hntzMH 

^M^l-o^tliT. G. Graeber and D. Eisenberg, Bio 
informatic identification of potential a 
utocrine signaling loops in cancers from 

gene expression profiles. Nature Geneti 
cs 2 9, pp. 2 9 5 - 3 0 0 (2 0 0 1) iSIfflMiiCov^ 
13 H . Ge^j, Correlation between transcriptom 
e and interactome mapping data fromSacch 
aromyces cerevisiae. Nature Genetics 29, 

pp. 482-486 (2 0 0 1 ) £-£tt-e*t##K L£ 0 <I U ttri£<7) 2ft<D£$Q 

hi>\ M.®<r> ii@(?)v^^>^i:(tLAit2»M , 7 h <; ^*£&^-r&;£« 

5:I/7LTv>SW^So ^0BJ*ffl#l-|e«? J: -? Kiitr[lJ<7) v-i' -y ^'B^jt^ 

[0 0 4 9] 
[^4] 
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4 





&Kfc!Sil£*!t»G?- 


























mm\m 




O OkO±MM<?) 



30 

[SE5I 
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5 



<d mm 










0±Elvf*i.j&*A:-3fe 


X-/-?— SFa 








oAiR«*a^&iie^A 
a&? (itefA)i<oK7 



[|E6] 



30 
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6 

















ae^cae^A)to!)^7 



•fe^-jtaT-fc L < tt'J Fittest V-b7'^-it{5^^)0^'^tf^#t U 
33-^HI^ttia^^^-fe-/^-ma : ?-C*ti-*f> TWKftlJKAfrSryjtr^ Kitted t< 

[0 0 5 0] 

|c? < fil^ftt >) x ^ P A±?)±lj||E? , Ttife^, /N'X^^-f P AfcffilW*- 
^ P B±<7)itf5T-> & L< Wtfltffiv>-f tifr<D3&te : ?'<DM*&tr4kftk-f2>o 
[0 0 5 1 ] 

fc^aUfefettC A±K&*Uf. 7/t"^3 $ *tfc*att^»JUOttH*-^ < *&I 

c A±<7)p^jtf5^ is l< iiiriBo«iffis*tfcjte^ttiriEo»«atfe 

[0 0 5 2 ] 

*?*5iMKISjEailtBA^y-<*>*>' hn->*-*«OA-C*tL*f, 7/r->a ^i^ttflJ? fl 
fz &&&*mm 14 £o* < #J 3K&fP £ > h n ->* - O A <75 ± |?&® <D * > h n ->* - * ^i" S 

"Tic 

[0 0 5 3 ] 

S4d*e>g6tfev»T*f£ : J : MOffl3LlW«* s B*«j&Woe5| (EC) T*o 

* 9 * fl^MaEJllliiJ Ste^ E C E C A *l *f > Tyf-v'aVi^Niaj?^ 
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&m&*mm k& ^ < e c a <d±^m Km-r & e c a t m 

[0 0 5 4] 

>55^Jfim$tL^*ii14^MliJ't4^Eo*<^m^#^^-A-^^-7r 5 'J - S F.Ai:« 

it^fc^SLSr-S-tf^t-f *e . ... - 

[0 0 5 5 ] . . 

[ 0 0 5 6 ] 

«n sua m t mtzv m e Mws.<f m * h m s *t* * <ou tr 

tti, 
[0 0 5 7] 

^4 tf»t>£ 6 fcfc^Titfc^M^fflSIW^* 5 * ? h 7 - * T^o-ft»ilSKfll«-e2bo fe» 

*H9§t*v»T, Hi 4, 115, Mtfrh 6 KfflTFLtz^&Zm^TKlg.'vr 

ft*) J: !J— JlKte^jgSite (n;U *«SEv^ *fclfft (m eta-know 

1 edge) £g|&T*§i> 

[ 0 0 5 8 ] 

£mr- 9 -?$>2>& l &&Sim<DT- 9 *zv h (Go l ub^l 9 9 9#) «r«ffl L, A# 
fl9*HWO*MS^to ^-^^T'fMiMITOt'f h^t>?* 7>n- r * L£ 
( h t t p : //www. genome, wi. mi t. edu /M PR) » Z<Df*— 9 
-tr y Mi , 2 ffiffio lite 6 ifiL#l : A L L (acute lymphoblastic 1 
eukemia) , AML (acute myeloid leukemia) h 
Jfil?fc£SURU Aff ymetr ixtff^ (681 7 itfc^) Srffl^TlSafi'** 
«jeU<)0-e*4o ^tf7 2 A (F*U 3 8 A**B - c e 1 1 ALL, 9 

A**T- cell ALL, 25 A* S AML) Tsbio iOf-?ty h ( 7 2 X 6 8 1 
7) Srffl^T, ll^-^O^^ 2fl^<7>6jfc^£* 7 v7-r $--->a V "TS 
i t* 5 T-i Si^ito &T\ 2ii<06M (ALL, AML) ZZH^tl, 99* 
0, It <m.m.-f%>o -T^fe*>, ALL=^7^0, AML=^7^1tn o & 33^146 
Ifll^KT)^- 9-ty Y (Golub^l999 #) fi, 7 2^#r f - 9 £ 3 8^M^--> 

h (d a t a_s e t _A L L AM L train, txt) 3 Ao>t7>\> 

•ty h (data_set _A L L AM L_ independent, txt) t'£Ht 

TJt&SftT^&o dd-C»i3 8#Mfc ( M^-->^-fey h) trffl^t^SS: 

J3£&v>, -€-^>|S**Jf v»T 3 44£# (r7, r-fey M ^^7-/7^-/3 >*Ufr£z 

[ 0 0 5 9 ] 
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<DT-9-tv hZttgir- 9 t Ltco Cl<7>9 %<D§ 8 1 7JR14»i. D N Af ? T'-CifiSSE L 
^I^lol^HIicMU ?S*5 O lJS1*f±, 6JflL*S?>tt3K (ALL, AMD 

jh&ai zlk r*j r+j ivm <0 3*x=rv^^Mt7t o 7 2iot>^ 

^fig-fet^o £ fcV SJM^R coffin 1 i: 2ffi3K^>8tM (^5X0. 
> 9-7 Zl) 4<7)-C> -e^tffflv^Tto 
[0 0 6 0] 

&{S^K*|-)&-r£ 6 8 1 7Jg'l££ I F-THENiV-JV^MB, SM^tt^^^jS 

BSfi 1 b LT,. fFM^±i£2. 0l®co;i— )W?r^i9mL^^*^g7 K^-fo 
# s l o<7) I F-THENJV-JV^MLT^^ iMfi-fito;*: £ ^MK±:d^MA,-e^& o 
UKOmimt, I F - T H E N ^- JV<7)^#^^M)£ L , ^2^!l, #.3.?!U SMyiJii-?- 
)V~)V<dWMM. K7f$, #^-^K*j-I&T&o JV-;v<7)|g^Ji^:T 
<D)^-)vc [&MM<dMM = 99X0} -eiwi— ^*-eii€B&L^ 0 #)xJf, 
^lfrfiTlF U07139_at = THEN S IfiL#l OitM = 9 7X0 
J £ ;v-;vc£> *) y Tuo 7 l 3 9_a tj t ^ ? jtfs^o^ms^ s r^cj -e£>££ 
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